piecewise strong convexity
Piecewise Strong Convexity of Neural Networks
We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems.
Reviews: Piecewise Strong Convexity of Neural Networks
Originality: I am not convinced that the contributions of this paper are more significant than that of [1], which have been cited in this paper already. Specifically, in comparison with [1] in Line 82, the authors state that these conclusions apply to a smaller set in weight space. I would appreciate it if the authors could quantify the difference here and have a discussion section to show the comparison with some form of mathematical comparison. Further, there have been quite a few papers that show convergence of GD on neural networks using something like strong convexity. Clarity The paper is written quite clearly and it is easy enough to follow the paper.
Reviews: Piecewise Strong Convexity of Neural Networks
This paper shows that the quadratic loss with weight decay of deep ReLU networks is piecewise strongly convex on a nonempty open set where every critical point is a local minimum, and every local minimum is isolated. Initially the paper received mixed reviews, with two positive and one negative review. On the positive side, the contribution is found to be quite significant because it analyzes realistic networks (deep and non-linear). On the other hand, one reviewer had issues with the proof, and another with the experiments. The rebuttal addressed the issues raised by the reviewers, and the negative review updated the score.
Piecewise Strong Convexity of Neural Networks
We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems.
Piecewise Strong Convexity of Neural Networks
We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay. We show that the regularized loss function is piecewise strongly convex on an important open set which contains, under some conditions, all of its global minimizers. This is used to prove that local minima of the regularized loss function in this set are isolated, and that every differentiable critical point in this set is a local minimum, partially addressing an open problem given at the Conference on Learning Theory (COLT) 2015; our result is also applied to linear neural networks to show that with weight decay regularization, there are no non-zero critical points in a norm ball obtaining training error below a given threshold. We also include an experimental section where we validate our theoretical work and show that the regularized loss function is almost always piecewise strongly convex when restricted to stochastic gradient descent trajectories for three standard image classification problems. Papers published at the Neural Information Processing Systems Conference.